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Detecting meaningful structure in neural activity and connectivity data is chal¬ 
lenging in the presence of hidden nonlinearities, where traditional eigenvalue- 
based methods may be misleading. We introduce a novel approach to matrix 
analysis, called clique topology, that extracts features of the data invariant un¬ 
der nonlinear monotone transformations. These features can be used to detect 
both random and geometric structure, and depend only on the relative order¬ 
ing of matrix entries. We then analyzed the activity of pyramidal neurons in 
rat hippocampus, recorded while the animal was exploring a two-dimensional 
environment, and confirmed that our method is able to detect geometric orga¬ 
nization using only the intrinsic pattern of neural correlations. Remarkably, 
we found similar results during non-spatial behaviors such as wheel running 
and REM sleep. This suggests that the geometric structure of correlations 
is shaped by the underlying hippocampal circuits, and is not merely a con¬ 
sequence of position coding. We propose that clique topology is a powerful 
new tool for matrix analysis in biological settings, where the relationship of 
observed quantities to more meaningful variables is often nonlinear and un¬ 
known. 
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Neural activity and connectivity data are often presented in the form of a matrix whose 
entries, C l3 , indicate the strength of correlation or connectivity between pairs of neurons, cell 
types, or imaging voxels. Detecting structure in such a matrix is a critical step towards under¬ 
standing the organization and function of the underlying neural circuits. This structure may 
reflect the coding properties of neurons, rather than their physical locations within the brain. 
For example, pyramidal neurons in rodent hippocampus possess a geometric organization due 
to their role in position coding. Each of these neurons, called place cells, acts as a position sen¬ 
sor, exhibiting a high firing rate when the animal’s position lies inside the neuron’s place field, 
its preferred region of the spatial environment ( 1 ). Because of this organization, the pairwise 
correlations C l3 between place cells decrease as a function of the distances between place field 
centers (2), and the matrix of correlations thus inherits a geometric structure. Alternatively, a 
correlation or connectivity matrix could be truly unstructured, such as the connectivity pattern 
in the fly olfactory system observed in (3), indicating random network organization. 

Can we detect the presence of structure - or randomness - purely from the intrinsic features 
of a matrix Cip. The most common approach is to use standard tools from matrix analysis 
that rely on quantities, such as eigenvalues, that are invariant under linear change of basis. 
This strategy is natural in physics, where meaningful quantities should be preserved by linear 
coordinate transformations. In contrast, structure in neural data should be invariant under matrix 
transformations of the form 

Cij = /(Aij), ( 1 ) 

where / is a monotonically increasing function (Figure la). This is because it is common for 
observed quantities in neuroscience to be related to more meaningful variables by a monotonic 
nonlinearity, such as the relationship between the strength of an imaging signal and the under¬ 
lying neural activity (4), or the relationship between neural activity and represented stimuli. For 
example, in the case of hippocampal place cells, pairwise correlations decrease with distance 
between place field centers in a nonlinear fashion. In less-studied contexts, the relationship of 
neural correlations to the represented stimuli may be completely unknown. 

Unfortunately, eigenvalues are not invariant under transformations of the form (|T]) (e.g. Sup¬ 
plementary Figure 1). Although large random matrices have a reliable eigenvalue spectrum (e.g. 
Wigner’s semicircle law (5)), it is possible that a random matrix with i.i.d. entries could be mis¬ 
taken as structured, purely as an artifact of a monotonic nonlinearity (Figure lb)[j] The results 
of eigenvalue-based analyses can thus be difficult to interpret, and potentially misleading. 

Here we introduce a new tool to reliably detect signatures of structure and randomness 
that are invariant under transformations of the form ([!]). The only feature of a matrix that 
is preserved under such transformations is the relative ordering of its entries, as < Cm 

'Note that the matrix C = f(A) in Figure lb is also a random matrix, with i.i.d. entries drawn from the 
transformed distribution. Although the eigenvalue spectrum still converges to the Wigner semicircle distribution in 
the limit of large N (6), the rate of convergence depends on the nonlinearity /, allowing for large deviations from 
the semicircle distribution as compared to a normally-distributed random matrix with the same N. 
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Figure 1: Order-based analysis of symmetric matrices, (a) A symmetric matrix A is related to another 
matrix C via a nonlinear monotonically increasing function f(x), applied entry wise, (b) (Left) Distribu¬ 
tion of eigenvalues for a random symmetric N X N matrix A , whose entries were drawn independently 
from the normal distribution with zero mean and variance 1 /y/~N (N = 500). (Right) Distribution of 
eigenvalues for the transformed matrix with entries C V] = f{Ajj), for f(x) = 1 — e~ 30x . Red curves 
show Wigner’s semicircle distribution with matching mean and variance, (c) (Top) The order complex 
of A is represented as a sequence of binary adjacency matrices, indexed by the density p of non-zero 
entries. (Bottom) Graphs corresponding to the adjacency matrices. Minimal examples of a 1-cycle (yel¬ 
low square), a 2-cycle (red octahedron) and a 3-cycle (blue orthoplex) appear at p = 0.1, 0.25, and 0.45, 
respectively, (d) At edge density po, there are no cycles. Cliques of size 3 and 4 are depicted with light 
and dark gray shading. As the edge density increases, a new 1-cycle (yellow) is created, persists, and 
is eventually destroyed at densities p f >2 ■ and p 3 , respectively, (e) For a distribution of 1000 random 
N x N symmetric matrices (N = 88), average Betti curves /3i(p), /^(p), and /T>(p) are shown (yellow, 
red, and blue dashed curves), together with 95% confidence intervals (shaded areas). 

whenever A tJ < A kf (see Supplementary Text). We refer to this combinatorial information as 
the order complex , orcl(C'). It is convenient to represent the order complex as a nested sequence 
of graphs, where each subsequent graph includes an additional edge ( ij ) corresponding to the 
next-largest matrix entry C^ (Figure lc). Any quantity computed from the order complex is 
automatically invariant under the transformations (]Tj), since ord(3L) = ord(C'). 

Perhaps surprisingly, the arrangement of cliques (all-to-all connected subgraphs) in the or¬ 
der complex of a matrix can be used in lieu of eigenvalues to detect random or geometric 
structure. Clique topology measures how cliques fit together and overlap. This can be quanti¬ 
fied by counting non-contractible cycles, or “holes,” that remain after all cliques in a graph have 
been “filled in” (Figure lc). As the edge density p increases, new cycles are created, modified, 
and eventually destroyed (Figure Id). One can track these changes by computing a set of Betti 
numbers (7, 8), /3 m , which count the independent m-cycles. The Betti numbers across all graphs 
in an order complex yield Betti curves, (3 m (p) (see Methods and Supplementary Text). 
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Although the details of individual graphs in the order complex may be sensitive to noise in 
the matrix entries, we found that clique topology provides robust signatures of randomness. In 
the case of a random symmetric matrix with i.i.d. entries, the corresponding order complex is a 
sequence of Erdos-Renyi random graphs. We found that the Betti curves 8 m (f>) are remarkably 
reliable for such matrices (Figure le), and display a characteristic unimodal shape with peak 
values that increase with m (m <C N ). This reliability has been theoretically predicted (9, 
10), and makes it possible to robustly distinguish random from non-random structure in the 
presence of a monotone nonlinearity ([!]). Unsurprisingly, correlation matrices obtained from 
finite samples of N independent random variables display the same characteristic Betti curves 
as random symmetric N x N matrices (Supplementary Figure 2). Note that computing low¬ 
dimensional (m < 3) Betti curves for matrices of size N~100 is numerically tractable due to 
recent advances in computational topology (8, 11,12). 

If a correlation or connectivity matrix is not random, is there specific structure one can de¬ 
tect? Uncovering geometric organization is especially important in the context of neuroscience. 
Neurons tuned to features that lie in a continuous coding space, such as orientation-tuned neu¬ 
rons (13) or hippocampal place cells (/), have correlations that decrease with distance. The ge¬ 
ometry of the coding space may therefore be detectable in the pattern of pairwise correlations, 
even if the nonlinear relationship between distances and correlations is unknown. Fortunately, 
the ordering of matrix entries encodes geometric features, such as dimension (Figure 2a). 

For larger matrices, the precise dimension may be difficult to discern in the presence of 
noise. Nevertheless, the organization of cliques in the order complex carries signatures of an 
underlying Euclidean geometry, irrespective of dimension. For example, the triangle inequality, 
|| x — z\\ < ||x — y\\ + \\y — z\\, implies that if two edges of a triangle are present at some 
edge density p, there is a higher probability of the third edge also being present. Intuitively, this 
means that cliques in the order complex will be more prominent for geometric as compared to 
random matrices, and cycles (Figure lc,d) will be comparatively short-lived, as cliques cause 
holes to be more readily filled (14). 

To see whether clique topology can provide reliable signatures of geometric organization, 
we computed Betti curves for distributions of geometric matrices (N = 88), generated from 
random points uniformly sampled from unit cubes of dimensions d = 5,10,16, 24, and 88, 
and having entries that decrease with distance (see Methods). We then computed average Betti 
curves /?i(p), /M/ 7 ), ar| d ih(p) for each d, and found that they are stratified by dimension but 
retain characteristic features that are independent of dimension (d < N). In particular, the 
peak values of geometric Betti curves are considerably smaller than those of random symmetric 
matrices with matching parameters (p < 0.001), and decrease with increasing m (Figure 2b). 
We conclude that Betti curves can, in principle, be used to distinguish geometric from random 
structure. 

Can clique topology be used to detect geometric organization from pairwise correlations in 
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Figure 2: Geometric structure is encoded in the ordering of matrix entries, (a) Three 5x5 symmet¬ 
ric matrices with distinct order complexes; the ten off-diagonal matrix values in each are ordered from 0 
to 9. (Left) An ordering of matrix values that can be obtained from an arrangement of points p, on a line, 
so that Aij < Af,( whenever [p, — Pj\\ < \\pk ~ Pt\\- (Middle) An ordering that arises from distances 
between points in the plane, but cannot be obtained from a one-dimensional point arrangement. (Right) 
A matrix ordering that cannot arise from distances between points in one or two dimensions, (b) Betti 
curves for distributions of geometric matrices (N = 88) in dimensions d = 5,10,16, 24, and 88. Mean 
Betti curves /3i(p), fh(p), and Ps(p) are shown (yellow, red, and blue curves), with darker (and higher) 
curves corresponding to larger d. 


noisy neural data? To answer this question, we examined correlations of hippocampal place 
cells in rodents during spatial navigation in a two-dimensional open field environment. In this 
context, geometric structure is expected due to the existence of spatially localized receptive 
fields (place fields (/)). but has not previously been detected intrinsically using only the pattern 
of correlations. We computed correlations from spike trains of simultaneously recorded neurons 
in area CA1 of dorsal hippocampus (see Supplementary Methods). Each pairwise correlation, 
Cij , was obtained from the mean of a cross-correlogram on a timescale of r max = Is (see 
Methods; see also Supplementary Figure 3). The resulting matrix was then analyzed using 
clique topology (Figure 3a). As expected, the Betti curves from place cell data were in close 
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agreement to those of geometric matrices (Figure 3b, top), up to a small rightward shift that is 
likely due to noise (Supplementary Figure 4). 
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Figure 3: Geometric structure of correlations for neurons with spatial receptive fields, (a) Betti 
curves of the pairwise correlation matrix for the activity of N = 88 place cells in hippocampus during 
open field spatial exploration, (b) (Top) Betti curves from panel (a) (bold lines) overlaid on the mean 
geometric Betti curves from Figure 2b. (Bottom) Comparison of Betti curves from panel (a) to those 
of shuffled correlation matrices (note the change in vertical scale), (c) Integrated Betti values B m for 
the curves in panel (a) (solid lines), compared to standard box plots of integrated Betti values for the 
1000 shuffled [s] and geometric [g] controls displayed in (b). The geometric box plots are shown for the 
highest dimension, d = 88, while the shaded area indicates the confidence interval across all dimensions 
d < 88. Betti values for the place cell data are significantly non-random (*, p < 0.001), and appear 
consistent with those of geometric matrices, (d) Percentage of non-geometric Betti values Bi, B 2 , B 3 for 
a range of correlation timescales r max . Each point is an average over 9 open field recordings. The arrow 
indicates the timescale considered in (a)-(c). (e) (Left) A place field together with a cartoon trajectory 
(white) and simulated spike train (bottom). (Right) Integrated Betti values for correlations in the place 
field model (bold lines, labelled PF) lie within the geometric regime, (f) (Left) A scrambled version of the 
place field in (e). (Right) Integrated Betti values for the scrambled PF model (bold lines) are significantly 
non-geometric (*, p < 0.05) for 62 and B:$, while Bt is in the geometric regime. The Betti values are also 
significantly smaller than those of shuffled controls (*, p < 0.05). Box plots for geometric and shuffled 
controls in (e-f) are the same as in (c). 
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We also compared the data Betti curves to shuffled controls, obtained by randomly permut¬ 
ing the matrix entries (Supplementary Figure 5a,b). Shuffling completely destroys any structure 
in the order complex, yielding distributions of Betti curves identical to those of random matrices 
(Figure le). We found that the Betti curves from place cell data were an order of magnitude 
smaller than the mean Betti curves of the shuffled matrices, and well outside the 95% confi¬ 
dence intervals (Figure 3b, bottom). To quantify the significance of non-random structure, we 


used integrated the Betti values 



and verified that they were significantly smaller than those obtained from 1000 trials of the 
shuffled controls (p<0.001), but well within the confidence intervals for geometric controls 
(Figure 3c). 

To test whether the observed geometric organization was consistent across animals and 
recording sessions, we repeated these analyses for 8 additional data sets from three different 
animals during spatial navigation (Supplementary Figure 6). All but one of the 9 data sets were 
consistent with the corresponding geometric controls, suggesting that geometric structure of 
correlations is a robust phenomenon during spatial navigation. We also repeated the analyses 
for different choices of the correlation timescale, r max , ranging from 10 ms to 2 s, and observed 
similar results (Figure 3d). As a further test of geometric organization, we computed the distri¬ 
bution of persistence lifetimes from the order complex of the open field correlation matrix (see 
Supplementary Text). The lifetime measures how long a hole persists as it evolves from one 
graph to the next in the order complex (Figure Id). Again, the data exhibited topological sig¬ 
natures that were far from random, but consistent with geometric organization (Supplementary 
Figure 7). 

To ensure that the observed correlation structure could not be explained by the differences in 
interactions of individual neurons with the “mean field” activity of the network, we performed 
an additional random control that preserves row and column sums of pairwise correlation ma¬ 
trices. Specifically, we computed Betti curves for matrices drawn from a weighted maximum 
entropy (WME) distribution, subject to the constraint that expected row sums match the origi¬ 
nal pairwise correlation matrix (Supplementary Figure 5c,d). The Betti curves and persistence 
lifetimes of the WME controls were similar to those of random symmetric matrices (Supple¬ 
mentary Figure 8), showing that the non-random structure in the data does not arise from the 
fact that some neurons have higher levels of correlation with the population as a whole. 


Are the spatial coding properties of place cells sufficient to account for the observed geomet¬ 
ric organization of correlations during spatial navigation? Or, alternatively, does this structure 
reflect finer features of the correlations, beyond what is expected from place fields alone? To ad¬ 
dress this question, we computed place fields iq(x) for each place cell from the same data used 
in Figure 3a, together with the animal’s two-dimensional spatial trajectory x(f) (see Supple¬ 
mentary Methods). We then generated synthetic spike trains for each neuron as inhomogeneous 
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Poisson processes, with rate functions rj(i) given by the simple place field model 

ri(t) = Ft (x(t)), (2) 

where x(t) is the animal’s actual trajectory (see Figure 3e). By design, the synthetic spike trains 
preserved the influence of place fields, but discarded all other features of the data, including 
precise spike timing and any non-spatial correlates. Perhaps unsurprisingly, Betti curves derived 
from the place field model reproduced all the signatures of geometric organization (Figure 3e; 
see also Supplementary Figure 9b,c), indicating that place fields alone could account for the 
results observed in the open field data. 

We next asked whether the geometry of place fields was necessary, or if the Betti curves 
during spatial navigation could be attributed to an even more basic feature of the data, which 
is that each neuron is driven by the same global signal, x(f), filtered by a cell-specific function 
F ? ;(x). To answer this question, we scrambled each place field by permuting the values of F ?: (x) 
inside “pixels” of a 100 x 100 grid, creating non-geometric receptive fields F ?; (x) (Supplemen¬ 
tary Figure 9d; a 10 x 10 scrambling is shown in Figure 3f for clarity). We then generated spike 
trains from the actual trajectory, as in equation ([2]), but using the scrambled place fields F, (x). 
For this model, we found that the second and third Betti curves were far outside of the geometric 
regime, while the first Betti curve (3i ( p ) was insufficient to rule out geometric organization (Fig¬ 
ure 3f; see also Supplementary Figure 9e-h). We obtained similar results after scrambling on a 
10 x 10 grid (Supplementary Figure 10). We conclude that the geometric signatures observed 
during spatial navigation reflect the geometry of place fields, and are not simply a consequence 
of neurons being driven by the same global signal, x(t). Nevertheless, each of the Betti curves 
for the scrambled place field model was also significantly smaller than those of random controls 
(Figure 3f; p<0.001), suggesting that neurons controlled by a global signal via non-geometric 
receptive fields do exhibit non-random structure in their pairwise correlations. 

The above results suggest that geometric structure in place cell correlations is a consequence 
of position coding, and is not necessarily expected during non-spatial behaviors. To see if this 
is true, we repeated our analyses on neural activity recorded during two non-spatial conditions: 
wheel running and REM sleep. Surprisingly, we found that the Betti curves were again highly 
non-random (Supplementary Figure 11), and consistent with geometric organization across all 5 
wheel running recordings and 3 out of 4 sleep recordings (Figure 4). These findings suggest that 
geometric organization on a timescale of r max ~ Is is a property of the underlying hippocampal 
network, and not merely a byproduct of spatially structured inputs. At much finer timescales, 
however, geometric features appear to deteriorate in both REM sleep and wheel-running con¬ 
ditions (Supplementary Figure 12), in contrast to the open field data (Supplementary Figure 
13). 

Discussion. We have developed a novel tool for detecting structural features of symmetric 
matrices that are invariant under nonlinear monotone transformations. We have shown that this 


8 


a wheel running 



N = 77 N = 73 N = 83 N = 67 N = 64 


b REM sleep 



N = 67 N = 69 N = 60" 


■ 

I 

I 

1 


* 


N = 61 


Figure 4: Geometric organization in hippocampus during non-spatial behaviors, (a) Integrated 
Betti values fii, 62 , and 0:$ (bold yellow, red, and blue lines) for 5 recordings from 2 animals, during 
wheel running. N indicates the number of neurons in each recording. Box plots indicate the distributions 
of Betti values for 100 geometric controls with matching N and dimension d = N . Shaded regions 
indicate confidence intervals for the full geometric regime, with d < N. (b) Integrated Betti values for 
4 recordings from 2 animals, during REM sleep. One Betti value was significantly non-geometric (*, 
p < 0.05). 

method can reliably detect both geometric and random structure in the presence of an unknown 
nonlinearity. Our approach exploits the little-known fact that the ordering of matrix entries, 
irrespective of their actual values, carries significant information about the underlying matrix 
organization. Unlike eigenvalues, which can be badly distorted by monotone nonlinearities, the 
information encoded in the order complex is invariant. Applying techniques from computational 
topology, relevant features can be extracted from the order complex that enable robust detection 
of geometric (or random) structure. Unlike previous instances of topological data analysis (15- 
20), our method relies on the statistical properties of cycles, as captured by Betti curves and 
persistence lifetime distributions, and is used as a generic tool for matrix analysis, rather than 
the analysis of point cloud data. 

We found that geometric organization of hippocampal place cell activity - a prerequisite 
for the existence of spatial receptive fields - can be detected from pairwise correlations alone, 
without any a priori knowledge about the nature of receptive fields. Using simulated data from 
a model, we confirmed that such geometric structure would be observed as a result of realistic 
place fields, but would not arise from non-geometric (“scrambled”) place fields. Perhaps sur¬ 
prisingly, we also found geometric organization in correlations during wheel running and REM 
sleep. We suggest that clique topology is a powerful new tool for matrix analysis, and one that is 
especially useful in biological settings, to detect relevant structure in the presence of unknown 
nonlinearities. 
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Methods 


Order complex. For any N x N symmetric matrix A with distinct entries, the order complex 
ord(A) is a sequence of graphs G 0 C G\ C • • • C G p , where 6' 0 is the graph having N vertices 
and no edges, G\ has a single edge (ij) corresponding to the highest off-diagonal matrix value 
Aij, and each subsequent graph has an additional edge for the next-highest off-diagonal matrix 
entry. The graphs {Gk } can also be indexed by the edge density, p = k/ Q) G [0, 1], where k 
is the number of edges in the graph Gk- 

Betti curves. A clique in a graph G is an all-to-all connected set of vertices in G. For each 
graph G in the order complex ord(A), we compute simplicial homology groups H m {X(G), Z 2 ) 
for m — 1,2 and 3, where X(G) is the clique complex of the graph G. We refer to this topo¬ 
logical information as the clique topology of G, to distinguish it from the usual graph topology. 
The dimensions of the homology groups yield the Betti numbers, (3 m = dim H m (X(G), Z 2 ). 
Indexing the graphs by edge density p, we organize the Betti numbers across all graphs in the 
order complex into Betti curves /3i(p),/3 2 (p), and /3 3 (p). The Betti curves provide a summary 
of the topological features of the matrix A. 

Random and geometric matrices. Random/shuffled matrices were created by randomly per¬ 
muting the (^) off-diagonal elements of C. This is equivalent to considering random symmetric 
matrices with i.i.d. entries, whose corresponding order complex is a sequence of nested Erdos- 
Renyi random graphs. Geometric matrices were obtained by sampling a set of N i.i.d. points 
uniformly distributed in the d-dimensional unit cube [0, l] rf C W 1 . for d < N. The matrix entries 
were then given by C i3 = —\\p t — p 3 \\, where the minus sign ensures that they monotonically 
decrease with distance, as expected for geometrically organized correlations. 

Computation of pairwise correlation matrices. Given a set of simultaneously recorded 
spike trains from N neurons, we computed cross-correlograms ccg l3 (r) for each pair of neu¬ 
rons i,j. These functions were then normalized by the average firing rates of the neurons, and 
integrated over a timescale r max in order to obtain an N x N matrix of pairwise correlations 
Cij. See Supplementary Methods and Supplementary Figure 3 for more details. 
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Supplementary Methods 

Data and preprocessing 

Experimental data. Spike trains of neurons in area CA1 of rodent hippocampus were recorded 
during three behavioral conditions: (i) spatial navigation, (ii) wheel running and (iii) REM 
sleep. Experimental procedures have been previously described in (21, 22). Spatial navigation 
was in a familiar, two-dimensional, 1.5m x 1.5m square box environment. Wheel running was 
in the context of a delayed alternation task, as described in (21,23). Periods of REM sleep were 
detected from the local field potential using the ratio of total delta power (0.1-3 Hz) to total 
theta power (5-10 Hz) in the spectrogram of the EEG (24). Periods with delta/theta ratio less 
than 1 were considered REM sleep. 

Selection criteria for cells and recordings. During each of the three behavioral conditions, 
only putative pyramidal cells whose average firing rates were in the 0.2-7 Hz range were used. 
Putative interneurons, defined as having an average firing rate above 7 Hz over an entire record¬ 
ing session, were excluded. Recordings with at least N — 60 neurons satisfying these criteria 
were selected for the analyses. A total of 18 recordings from 5 animals met the selection crite¬ 
ria. These consisted of 9 “open field” spatial navigation data sets from three animals, 5 wheel 
running data sets from two animals, and 4 REM sleep data sets from two animals. The num¬ 
ber of cells and length of each recording were: N = 76 (16 min), N=81 (16 min), N=72 (63 
min), N=88 (63 min), N=68 (59 min), N=64 (50 min), N=67 (50 min), N=74 (59 min), and 
N=66 (60 min) for spatial navigation; N=77 (26 runs, 6 min), N=73 (25 runs, 5 min), N=83 (13 
runs, 5 min), N=67 (7 runs, 2 min), and N=64 (10 runs, 3 min) for wheel running; and N=67 
(30 s), N=69 (4 min), N=60 (2.5 min), and N=61 (3.5 min) for REM sleep. The N=88 spatial 
navigation data set was used in Figure 3. 

Computation of the pairwise correlation matrices. The cross-correlograms were computed 
as ccgjj (t) = f 0 T fi{t)fj(t + r)df, where /, (f) is the firing rate of the i-th neuron and T is the 
total duration of the considered time period. The normalized cross-correlation on a timescale of 
r max was computed as 

| / CTmax /‘I’max \ 

Cij = - max ( / ccgiA t) dr, / ccg Jt)(1t , 

TmaxA'O o Jo J 

where r, is the average firing rate of the z-th neuron (see Supplementary Figure 3). 

Simulated spike train data from PF and scrambled PF models. For each cell in the N=88 
spatial navigation data set, place fields T)(x) were computed using a 100 x 100 grid of pixels. 
In each pixel, the number of spike events for each cell was normalized by the time spent at 
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that location, and then smoothed with a 2-dimensional Gaussian (cr = 5 grid locations, see also 
Supplementary Figure 9). Simulated spike trains for the PF model were generated from the 
place fields as inhomogeneous Poisson processes with rate functions r*(f) = Fj(x(£)), using 
the animal’s original trajectory, x(f). 

Scrambled place fields F, (x) were computed by randomly permuting the pixels in the 100 
x 100 grid. A different permutation was used for each place field, in order to destroy the co¬ 
herence of the spatial organization across the population. Similar spike trains for the scrambled 
PF model were again generated as inhomogeneous Poisson processes using the original trajec¬ 
tory, x(t), but with rate functions r,,(t) = F)(x(f)) given by the scrambled place fields (see 
Supplementary Figure 9). The simulated data sets were analyzed in Figure 3 at the correlation 
timescale r max = lsec. 


Random and geometric matrices 

For each symmetric matrix C we consider three types of controls: shuffled (or “random”) con¬ 
trol matrices, weighted-maximum-entropy (WME) control matrices, and geometric matrices. 

Shuffled matrices were created by randomly permuting the ( 9 ) off-diagonal elements of 
C. Because only the ordering of matrix elements is considered in the subsequent topological 
analyses, this is equivalent to considering random symmetric matrices with i.i.d. entries, whose 
corresponding order complex is a sequence of nested Erdos-Renyi random graphs. 

WME matrices were obtained by sampling the maximum entropy distribution on weighted 
graphs with constrained mean degree sequence induced by C. This distribution was previously 
described in (25) and is estimated by summing the rows of C. The distribution on each element 
(i , j) in the matrix is exponential with mean The parameters 0, were obtained by solving 

i \ w j 

the system of equations 


y 1 

h (>i + 


ye,,, fori = 1,..., N, 
j¥=i 


using standard gradient descent methods (25). 

Geometric matrices were obtained by sampling a set of N i.i.d. points uniformly dis¬ 
tributed in the d-dimensional unit cube |0, l] d C W 1 . for d < N. The matrix entries were then 
given by C l} = — \\p% — Pj\\, where the minus sign ensures that they monotonically decrease 
with distance, as expected for geometrically organized correlations. 


Clique topology 

We performed topological data analysis on pairwise correlation matrices. Here we describe the 
general procedure; for more detailed explanations, see the Supplementary Text. Our approach 
relies on the statistical properties of homology cycles, as captured by Betti curves, in order to 
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detect structure (or randomness) in symmetric matrices. Note that even for data reflecting a 
two-dimensional space, the higher-dimensional Betti curves (3 2 (p) and (3 3 (p) were crucial for 
detecting the presence or absence of geometric organization (Figure 3), while (3i (p) was less 
informative @ 


Order complex. For any N x N symmetric matrix A with distinct entries, the order complex 
ord(A) is a sequence of graphs G 0 C Gi C • • • C G p , where G 0 is the graph having N vertices 
and no edges, G i has a single edge ( ij) corresponding to the highest off-diagonal matrix value 
Aij, and each subsequent graph has an additional edge for the next-highest off-diagonal matrix 
entry. The graphs { G k } can also be indexed by the edge density, p = k/ (vf) 6 [0,1], where k 
is the number of edges in the graph G k . 

Betti curves. A clique in a graph G is an all-to-all connected set of vertices in G. For each 
graph G in the order complex ord(A), we compute simplicial homology groups H m (X(G). Z 2 ) 
for m = 1, 2 and 3, where X(G) is the clique complex of the graph G. We call this the clique 
topology of G, to distinguish it from the usual graph topology. The dimensions of the homology 
groups H m (X(G), Z 2 ), yield the Betti numbers (3 m . Indexing the graphs by edge density p, we 
organize the Betti numbers across all graphs in the order complex into Betti curves 3\ (p), /3 2 (p), 
and (p). The Betti curves provide a summary of the topological features of the matrix A. 

To compute Betti curves for a matrix A, we begin by finding all maximal cliques of up to 
five vertices (those are needed to compute 3:3 for each graph G p , with p < 0.6. The result¬ 
ing lists are then input into Perseus, a computational topology software package implemented 
by Vidit Nanda (27); this software builds on work by Mishaikow and Nanda (28) using dis¬ 
crete Morse theory to reduce the sizes of simplicial complexes before performing persistent 
homology computations. All software used in this process is available in the Matlab package 
CliqueTop (29). 


Integrated Betti values. In order to facilitate the comparison of Betti curves to control ma¬ 
trices, we integrate the Betti curves with respect to graph density: 


3m = 3m (p) dp. 


The values fii, 3-i, and 3 : > were computed for each data set. For distributions of shuffled and 
geometric control Betti curves, the resulting integrated Betti values are summarized in box-and- 
whisker plots. We used standard box plots in Matlab, with bottom, middle, and top horizontal 
lines on the boxes denoting first quartile (Q\. 25%-ile), median (50 %-ile), and third quartile 

2 The Betti curve /3 0 (p), which we have not used here, counts the number of connected components in each 
clique complex, and may thus be useful for clustering (26). 
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(Q 3 , 75%-ile) boundaries in the distributions of integrated Betti values; while the bottom and 
the whiskers denote Q i — 1 . 5(0.3 — Q i) and Q.\ + 1 . 5(0.3 — Q\ ) respectively. 

Significance threshold. Our threshold for rejecting the geometric hypothesis for a given in¬ 
tegrated Betti value was obtained from the box-and-whisker plot for a distribution of 100 geo¬ 
metric matrices with matching N and dimension d — N. Specifically, we used the top whisker 
value, Q 3 + 1.5(<5 3 — Qi), as the significance threshold. The bottom whisker was not used, as 
Betti values lower than this are consistent with geometric matrices with smaller dimension d. In 
a normal distribution, 99.3 % of the data lie within the whiskers, so that less than 0.4 % of data 
points lie above the top whisker. Our integrated Betti values f3 m = f^/3 m (p)dp for geometric 
controls, however, are not normally-distributed. In the case of jdi and ri 2 , the top whisker corre¬ 
sponds, on average, to the 98 %-ile of the distribution. In the case of /5 3 , the top whisker is just 
under the 97 %-ile value. A data point above the top whisker is thus inconsistent with geomet¬ 
ric controls with p < 0.05. For comparisons against shuffled/random control distributions, we 
computed the p-value directly from the distribution, as in these cases we built the distributions 
from 1000 trials, rather than just 100. Note that clique topology computations are much faster 
for matrices with random structure than for geometric matrices, because of differences in the 
statistics of the cliques. 
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Supplementary Figures 
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Supplementary Figure 1: Spectral signatures of matrix structure are destroyed by non¬ 
linear monotonically increasing transformations, (a) A 100 x 100 symmetric matrix A with 
rank(yl) = 4. (b) The spectrum of A includes four nonzero eigenvalues that are all posi¬ 
tive. This is the signature that A has rank 4 and is positive semidefinite. (c) The graph of the 
monotonically increasing function f(x ) = (x — 0.2) 3 . (d) The spectrum of the matrix f(A) 
contains many nonzero eigenvalues. The spectral signature that A has low-rank structure has 
been destroyed by /. 
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Supplementary Figure 2: Topological properties of random correlation matrices are sim¬ 
ilar to those of random i.i.d. matrices. The N x N correlation matrix C V] = corr(A" ( , X :j ) 
was computed using 10, 000 samples of N = 88 independent uniformly distributed random 
variables X,. Each panel compares the Betti curves (top) and the persistence lifetimes (bottom) 
of the random correlation matrices to those of the random (shuffled) matrices. Each of the three 
black lines correspond to one instance of such a correlation matrix. Colored lines and shaded 
regions correspond to the mean curves and 95% confidence intervals for the random (shuffled) 
matrices. 
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Supplementary Figure 3: Computation of pairwise correlation matrices from spike train 
data, (a) For a pair of spike trains {t\}e=i... ni and {t\}e .=for neurons i and j (top), the 
cross-correlogram ccg tj (r) is computed as 

ccg ij(r) = f 1^ Mtffiit + r ) d ^ 

where f t (t) = YT 1 L 1 $(t — t\) is the instantaneous firing rate of the z-th neuron. The graph 
of a smoothed ccgj-(r) is displayed (black curve) along with the expected value of the cross- 
correlogram, r^j (red), for uncorrelated spike trains with matching firing rates, r, = rii/T. The 
pairwise correlations C l3 (blue line), with timescale r max , were computed as 

Cij = - maxi / ccg i7 -(r)dr, / ccg Jr)dT 

The timescale r max = 1 sec was used in all but one panel of Figures 3 and 4, while a range of 
timescales from 10 ms to 2 sec appears in Figure 3d. (b) The 88 x 88 matrix C for the spatial 
exploration data used in Figure 3a,b,c. The entry C\4 A r> corresponds to the cross-correlogram 
in panel (a). 
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Supplementary Figure 4: The rightward shift of the Betti curves can be explained by 
adding noise to “noiseless” geometric matrices. N — 70 random independent uniformly 
distributed points pi in a (-/-dimensional unit cube were sampled in dimensions d — 10 and 
d = N. Noised geometric matrices were obtained as 

Aij = -||Pi -pj || + ( vVd)g i:j , 

where the terms were independent and normally distributed with zero mean 
and unit variance, and the normalized noise strength v took the following values: 
0.01,0.02,0.03,0.04,0.05,0.1,0.25,0.5. Note that for v = 0.5 the matrix entries are domi¬ 
nated by the noise term, and A r] is close to a random matrix. Each panel compares the Betti 
curves ( d = 10 top, and d = N bottom) of the noised matrices (thin curves, stratified by the 
magnitude of the parameter v) to those of the random matrices (dashed lines) and also the zero 
noise geometric matrices (thick curves). Black lines connect the maximum values of the mean 
curves for the different levels of noise. Small amounts of noise produce a rightward shift in the 
peak values of the Betti curves, while curves for v = 0.5 are indistinguishable from those of 
random matrices. 
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Supplementary Figure 5: Two types of random controls, (a) The matrix CV, used in Figure 
3a. (b) A shuffled matrix obtained by randomly permuting the ( 88 ) off-diagonal elements of the 
symmetric matrix in panel (a), (c) A sample from the maximum entropy (WME) distribution 
on 88 x 88 symmetric matrices with prescribed expected values of row sums matching those of 
the matrix in panel (a). The distribution on each element (i, j ) in the matrix is exponential with 
mean (Hillar & Wibisono, 2013 http://arxiv.org/abs/1301.3321). The parameters (9, were 
obtained by solving the system of equations 

E^-=E C ‘- for * — 1,..., V 

using standard gradient descent methods (Hillar & Wibisono, 2013). (d) (left) The row sums 
Cij for the matrix in (a); (right) mean row sums for twenty samples from the distribution 
described in (c). 
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N=76 N=81 N=72 N=88 N=68 N=64 N=67 N=74 N=66 


Supplementary Figure 6: Integrated Betti values obtained from neural activity during 
spatial navigation are consistent with those of geometric matrices. Integrated Betti val¬ 
ues from place cell data are compared to geometric distributions with matching N across nine 
recordings of rat hippocampus during spatial navigation, obtained from three animals. The ge¬ 
ometric box plots are shown for the dimension, d = N, while the shaded area indicates the 
confidence interval across the dimensions d < N. The number N of neurons is displayed in 
color (black, green, and purple) to indicate recordings from the same animal. Betti values for the 
place cell data are consistent with those of geometric matrices in all but one data set (N = 68). 
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Supplementary Figure 7: The persistence lifetime distributions, computed from neural 
activity during spatial navigation resemble the geometric controls but are significantly be¬ 
low those of the shuffled matrices. Persistence lifetime distributions, £\{p) (yellow), (£>(p) 
(red) and £s(p) (blue), computed from the same data set and controls as those used in Figure 
3b. (Top) The lifetime distributions for the data (solid lines) fall off quickly, while those of the 
shuffled matrices are much broader (dashed lines are means over 1000 trials; shading shows the 
95% confidence intervals). (Bottom) Here the data lifetime distributions (solid lines) are over¬ 
laid with the mean distributions (faint lines) for 1000 geometric matrices in each dimension 
d — 5,10,16, 24, and 88. As with the geometric Betti curves, the persistence lifetime distribu¬ 
tions for geometric matrices are stratified by dimension, with the top curves corresponding to 
the highest dimension. 
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Supplementary Figure 8: Clique topology of WME matrices, sampled from the maximum 
entropy distribution with prescribed row sums, is similar to that of random (shuffled) 
matrices. Comparison of Betti curves (top) and persistence lifetimes (bottom) for WME ma¬ 
trices computed using the matrix C VJ used in Figure 3a. Each panel compares the Betti curves 
(top) and the persistence lifetimes (bottom) of the WME matrices, computed using the matrix 
Cij used in Figure 3a, to those of the random (shuffled) obtained from the same C V} . Each of 
the three black lines correspond to one sampling of a WME matrix. Colored lines and shaded 
regions correspond to the mean curves and 95% confidence interval for the shuffled matrices. 


24 


















a 


0 


Place fields 


1 









Supplementary Figure 9: (a) Example of several place fields computed from spike trains during spatial 
exploration in the N=88 data set. (b) Betti curves (bold lines) computed from the simulated spike trains 
(r max = lsec) for the place field model versus the geometric Betti curves (thin lines stratified by dimen¬ 
sions). (c) Integrated Betti values (bold lines, labelled PF) for the curves in panel (b) lie in the geometric 
regime, in agreement with those of the original data, (d) The scrambled place fields corresponding to 
those in panel (a). These were obtained by sub-dividing the square into a 100 x 100 grid and randomly 
permuting each pixel. The permutations were independent for each cell, (e) Betti curves (bold lines) 
derived from the spike trains (r max = lsec) generated using the scrambled PF model versus the geo¬ 
metric Betti curves (thin lines stratified by dimensions), (f) Integrated Betti values from the scrambled 
PF model (bold lines, labelled scPF) lie outside of the the significance threshold (see Supplementary 
Methods) for the geometric regime for B 2 and B:\- (g) Betti curves (bold lines) derived from the spike 
trains (r max = lsec) generated using the Scrambte(l PF model are significantly smaller than those of 
shuffled controls, (h) Integrated Betti values derived from from the scrambled PF model (bold lines, la¬ 
belled scPF) are outside the 99.9% confidence intervals for the shuffled matrices. Box plots for shuffled 
matrices are the same as in Figure 3c. 











































Supplementary Figure 10: Betti curves and summary stats for the 10x10 grid scrambled 
place fields, (a) A single scrambled place field corresponding to the place field in Figure 3e. 
The scrambling is performed on a 10 x 10 grid. A cartoon trajectory (white) is displayed 
together with the corresponding spike train (bottom), (b) Betti curves from the scrambled place 
field model (bold lines) lie outside the geometric regime for u 2 and /i 3 . (c) Betti curves for the 
scrambled place field model are significantly smaller than the shuffled controls, (d) Integrated 
Betti values (bold lines, labelled scPF) for the scrambled place field model also lie outside of 
the significance threshold (see Supplementary Methods) for the geometric regime for 3 2 and 
while B\ is in the geometric regime, (e) Integrated Betti values (bold lines, labelled scPF) 
for the scrambled place field model are outside the 99.9% confidence intervals for the shuffled 
matrices. Box plots for geometric and shuffled matrices are the same as in Figure 3c. 
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Supplementary Figure 11: The clique topology of spike train correlations during wheel 
running and REM sleep is significantly non-random as compared to shuffled matrices. 

(a) Comparison of the data Betti curves (solid lines) for the spike trains during wheel running 
(N=73 in Figure 4a) to those of shuffled correlation matrices (dashed lines are means over 1000 
trials and shading shows 95% confidence intervals, as in Figure le). For each m = 1,2, 3, 
the data Betti curves are orders of magnitude smaller than those of the shuffled curves, (b) 
Persistence lifetime distributions (solid lines) for the same data as in (a), l\ (p) (yellow), £■>([)) 
(red) and f 3 (p) (blue). The lifetime distributions for the data fall off quickly, while those of 
the shuffled matrices are much broader (dashed lines are means over 1000 trials; shading shows 
the 95% confidence intervals), (c) Comparison of the Betti curves (solid lines) for spike train 
correlations during REM sleep (N= 67 in Figure 4b) to those of shuffled correlation matrices 
(dashed lines are means over 1000 trials and shading shows 95% confidence intervals, as in 
Figure le). For each m — 1,2, 3, the data Betti curves are orders of magnitude smaller than 
those of the shuffled curves, (d) Persistence lifetime distributions (solid lines) for the same data 
as in (c). The lifetime distributions for the data fall off quickly, while those of the shuffled 
matrices are much broader (dashed lines are means over 1000 trials; shading shows the 95% 
confidence intervals). 
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Supplementary Figure 12: Integrated Betti values across a range of correlation timescales 
in spatial navigation, wheel running, and REM sleep data sets. For each of the datasets, 
and each m = 1,2,3, the integrated Betti number B rn = J^/3 m (p)dp was normalized by its 
significance threshold b m = Q 3 + 1.5 x (Q 3 — Q i) (see Supplementary Methods) that was 
obtained from the distribution of the integrated geometric Betti curves /3!~% om = f^/3m° m (p)dp 
in dimension d = N, where N was the number of cells. Each of the curves (yellow, red and 
blue) correspond to the values of Bm/b m in dimensions m = 1, 2, 3 respectively. The dashed 
line marks the line j3 m = b m ; the appropriate integrated Betti numbers were deemed consistent 
with geometric distribution if they lay below this line. Note that the number N of neurons is 
displayed in color (black, green, purple, gray and orange) to indicate recordings from the same 
animal; there were a total of 5 animals. 
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Supplementary Figure 13: Percentage of integrated Betti values that were not consistent with 
geometric controls (i.e. above the significance threshold) across all considered data sets as a 
function of the correlation timescale, r max , used to compute the pairwise correlation matrix (see 
Methods). The arrow indicates the 1 s timescale used in the main figures. All three behav¬ 
ioral conditions, spatial navigation (black), wheel running (green) and REM sleep (pink), are 
consistent with geometric structure at timescales ranging from .5 s to 2 s. At finer timescales, 
however, the wheel running and REM sleep correlations are non-geometric, while the spatial 
navigation data remains consistent with geometric controls. 

For each timescale, we computed integrated Betti values /3i, (3 2 , and 3 :i for all data sets under 
three different conditions: (i) spatial navigation, (ii) wheel running, and (iii) REM sleep (see 
Supplementary Figure 12). For each behavioral condition, we counted how many Betti values 
were above the significance threshold for rejecting the geometric hypothesis at each timescale. 
Because our significance threshold rejects the geometric hypothesis at a rate of less than 5%, 
the p -value for a given condition and timescale satisfies 

p < E(ij( a05)f(1 - a05r "'' 

where k is the number of Betti values above the significance threshold, and m is the total number 
of Betti values. To obtain this upper bound on p- value, we used a binomial distribution with 
failure probability 0.05. Note that this assumes Betti values are independent. Although this is a 
reasonable assumption for Betti values from different data sets, Betti values from the same data 
set have statistical dependencies that are not well-understood. 
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